Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Requirement already satisfied: graphviz in /usr/local/lib/python3.10/dist-packages (0.20.1)
CIS 432 Homework 5
Simon Business School
CIS432 [33B]
PREDICTIVE ANALYTICS USING PYTHON
Prof Yaron Shaposhnik
Spring B
Homework 5
| Name | Email-Id | Student No. |
|---|---|---|
| Neeraja Menon | nmenon2_simon@simon.rochester.edu | 31318631 |
| Neha Jayachandran | njayacha@simon.rochester.edu | 32275841 |
| Saptarishi Pandey | spandey3@simon.rochester.edu | 32371505 |
Introduction
In this assignment, we will develop a predictive model decision support system that evaluates the risk of Home Equity Line Of Credit applications (HELOC).
Based on this predictive model, we intend to design an interactive interface that sales representatives in a bank/credit card company can use to decide on whether to accept or reject an application.
About the dataset
To familiarize ourselves with the use case and the nuances of the dataset, we further researched on the dataset provided to us. Based on the FICO website, we found that credit scores are important to consider when financial institutions evaluate the risk on loans. The scores are designed to predict the likelihood of loan repayment. When a loan is rejected, regulators require the institution to inform the customers why their loan application is rejected in the first place. Customers demand explanations for their scores. If models are not interpretable, they are unlikely to be deployed in the real world as they do not meet the regulatory standards. Thus, one of our goals with our predictive model is to make it interpretable so the sales representative using the interface can give a reasonable explanation as to why the customer’s loan application is rejected.
HELOC
The HELOC allows property owners to take loans using the equity in their property as collateral. When a customer applies for a HELOC, the financial institution appraises their property and subtracts any mortgages. The remainder is the home equity which becomes the maximum amount that can be borrowed (credit limit). Because a home is often a consumer’s most valuable asset, many homeowners use HELOCs only for major items such as home improvement, medical bills, or education, unlike a credit card that is generally used for day-to-day expenses.
The customers in this dataset have requested a credit line in the range of $5000 through $150,000. The objective of the model is to predict whether the customers will repay their HELOC account within 2 years. This prediction is then used to decide whether the homeowner qualifies for a HELOC, and if so, how much the credit should be extended.
The target variable to predict is a binary called RiskPerformance. Bad RiskPerformance means that the consumer was 90 days pas t due or worse at least once over a period of 20 months when the creid account was opened. Good RiskPerformance means that they have made their payments without ever being more than 90 days overdue.
| Variable | Explanation |
|---|---|
RiskPerformance |
Binary target |
ExternalRiskEstimate |
Consolidated indicator of risk markers |
MSinceOldestTradeOpen |
Number of months that have elapsed since the first trade |
MSinceMostRecentTradeOpen |
Number of months that have elapsed since the last opened trade |
AverageMInFile |
Average months in file |
NumSatisfactoryTrades |
Number of satisfactory trades |
NumTrades60Ever2DerogPubRec |
Number of trades which are more than 60 days past due |
NumTrades90Ever2DerogPubRec |
Number of trades which are more than 90 days past due |
PercentTradesNeverDelq |
Percent of trades, that were not delinquent |
MSinceMostRecentDelq1 |
Number of months that have elapsed since last deliquent trade |
MaxDelq2PublicRecLast12M |
The longest delinquency period in the last 2 months |
MaxDelqEver |
The longest delinquency period |
NumTotalTrades |
Total number of trades |
NumTradesOpeninLast12M |
Number of trades opened in the last 12 months |
PercentInstallTrades |
Percent of installment trades |
MSinceMostRecentInqexcl7days |
Months since last inquiry (excluding the last 7 days) |
NumInqLast6M |
Number of inquiries in the last 6 months |
NumInqLast6Mexcl7days |
Number of inquiries in last 6 months (excluding the last 7 days) |
NetFractionRevolvingBurden |
Revolving balance divided by credit limit |
NetFractionInstallBurden |
Installment balance divided by original loan amount |
NumRevolvingTradesWBalance |
number of revolving trades with balance |
NumInstallTradesWBalance |
number of installment trades with balance |
NumBank2NatlTradesWHighUtilization |
number of trades with high utilization ratio (the amount of a credit card balance compared to the credit limit) |
PercentTradesWBalance |
percent of trades with balance |
Data exploration
Upon observing the data, we could notice some interesting trends in the data. Clearly there are special encodings for certain values. Many features have encodings like -7. -8, or -9 in them.
Let’s create the training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X,
y,
test_size = 0.2,
random_state = 1234)Preprocessing
Special Values
Since this case is clearly the same as the case we had in Assignment 2 (there are special encodings for missing data). We created special pipelines to deal with them.
From the data dictionary, we know that
| Value | Explanation |
|---|---|
| -9 | No Bureau record or investigation |
| -8 | No usable/valid trades or enquiries |
| -7 | Condition not met (no inquiries/no delinquencies) |
We could see that when the value is -9, it meant that there was no record or no investigation. We thus designed our model to not predict values for -9.
The pairplot below has more details about each feature with another.